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1. Introduction 

Constructing good tests for statistical hypotheses is an essential problem of 
statistics. There are two main approaches to constructing test statistics. In 
the first approach, roughly speaking, some measure of distance between the 
theoretical and the corresponding empirical distributions is proposed as the test 
statistic. Classical examples of this approach are the Cramer- von Mises and the 
Kolmogorov-Smirnov statistics. Although, these tests works and are capable of 
giving very good results, but each of these tests is asymptotically opti mal only 
i n a fi nite number of directions of alternatives to a null hypothesis (see 
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Nowadays, there is an increasing interest to the second approach of con- 
structing test statistics. The idea of this approach is to construct tests in 
such a way that the tests would be asymptoticaUy optimaL Test statistics con- 
structed foUowing this approach a re often called ( efficient) score test statistics. 



The pion eer of thi s appro a ch was 



followed: 



Le CamI (|l95ffl 



Nevm, 



ani (I1959I) 



Ledwina 



Cox and Hinklev 



NevmanI ( 19371) a n d then many othe r work s 



197^, 



Bickel and Ritovl (|l992f) . 



(|l994l ). Thi s approach is also cl o sely related to the the- 



ory of efficient (adaptive) estimation ■ 



Bickel et al 



( 1993h . llbragimov and Has^minskii 



(|l981r ). Score tests are asymptotically optimal in the sense of intermediate effi- 



ciency in an infinite number of directions of alternatives (see 



1996 )) and show good overall performa nce in practice (jKallenberg and Ledwina 



19951 ). 



Kallenberg and Ledwina 



Inglot and Ledwina 



(|1997D ). 

We described the situation in classical hypothesis testing, i.e., testing hy- 
potheses about random variables Xi, . . . , X„, whose values are directly observ- 
able. But, it is important from practical point of view to be able to construct 
tests for situations where Xi , . . . , Xn are corrupted or can only be observed with 
an additional noise term. These kind of problems are termed statistical inverse 
problems. The most well-known example here is the deconvolution problem. 
This problem appears when one has noisy signals or measurements: in physics, 
seismology, optics and imaging, engineering. It is a building block for many 
complicated statistical inverse problems. 

Due to importance of the deconvolution problem, testing statistical hypothe- 
ses related to this problem has been widely studied in the literature. But, to our 
knowledge, all the proposed tests were based on some kind of distance (usually 
a i2~type distance) between the theoretical d ensity function and the empir - 



ical estimate of the densit y (see, for examp le. 



Delaigle and Gijbel! 



mm . 



Holzmann et al 



Bickel and RosenblattI (|l973f ). 



(120071 )). Thus, only the first ap- 



proach described above was implemented for the deconvolution problem. 

In this paper, we treat the deconvolution problem with the second approach. 
We construct efficient score tests for the problem. From classical hypothesis 
testing, it was shown that for applications of efficient score tests, it is im- 
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portaiit to select the ri ght number o f components in the test statis t ic (se e 



Bickel and Ritovl (|l992l ). 



Eubank et al 



(19931), 



Kallenberg and Ledwinal (|l995l ). 



FanI ([19960). Thus, we p rovide corre s pond ing refinement of our tests. Following 



the solution proposed in iKallenbergl (j2002l ). we make our tests data-driven, i.e., 
the tests are capable to choose a reasonable number of components in the test 
statistics automatically by the data. 

In Section [H we formulate the simple deconvolution problem. In Section [31 
we construct the score tests for the parametric deconvolution hypothesis. In 
Section [5l we prove consistency of our tests against nonparametric alternatives. 
In Section [SJ we turn to the deconvolution with an unknown error density. We 
derive the efficient scores for the composite parametric deconvolution hypothesis 
in Section [71 In Section [51 we construct the efficient score tests for this case. In 
Section [51 we make our tests data-driven. In Section 1101 we prove consistency 
of the tests against nonparametric alternatives. Additionally, in Sections [H and 
1101 we explicitly characterize the class of nonparametric alternatives such that 
our tests are inconsistent and therefore shouldn't be used for testing against the 
alternatives from this class. Some simple examples of applications of the theory 
are also presented in this paper. 



2. Notation and basic assumptions 

The problem of testing whether i.i.d. real- valued random variables Xi, . . . ,X„ 
are distributed according to a given density / is classical in statistics. We con- 
sider a more difficult problem, namely the case when Xi can only be observed 
with an additional noise term, i.e., instead of Xi one observes Yi, where 

^ — Xi "1- £ j , 

and e's are i.i.d. with a known density h with respect to the Lebesgue measure 
A; also Xi and are independent for each i and E Si ~ 0, < E < oo. For 
brevity of notation say that Xi, Yi, Si have the same distribution as random 
variables X, Y, e correspondingly. Assume that X has a density with respect to 
A. 
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Our null hypothesis Hq is the simple hypothesis that X has a known density 
/o with respect to A. The most general possible nonparametric alternative hy- 
pothesis Ha is that / 7^ /o- Since this class of alternatives is too broad, first we 
would be concerned with a special class of submodels of the model described 
above. In this paper we will at first assume that all possible alternatives from 
Ha belong to some parametric family. Then we will propose a test that is ex- 
pected to be asymptotically optimal (in some sense) against the alternatives 
from this parametric family. However, we will prove that our test is consistent 
also against other alternatives even if they do not belong to the initial paramet- 
ric family. The test is therefore applicable in many nonparametric problems. 
Moreover, the test is expected to be asymptotically optimal (in some sense) for 
testi ng against an infini t e num ber of directions of nonparametric alternatives 



(see 



Inglot and Ledwina 



1996( )). This is the general plan for our construction. 



3. Score test for simple deconvolution 

Suppose that all possible densities of X belong to some parametric family {fe}, 
where is a fc— dimensional Euclidean parameter, O G M'^ is a parameter set. 
Then all the possible densities q{y;6) of y have in such model the form 



q{y;0)^ / fe{s)h{y--s)ds. (1) 
Jr 

The score function I is defined as 



ny;0)= ^^^M,^o)>o], (2) 

where q (9) q (y; 0) and / (9) := I {y; 9) for brevity. The Fisher information 
matrix of parameter 9 is defined as 



I{9)^ / i{y;9)r{y;9)dQe{y). (3) 
Jm 

Definition 1. Call our problem a regular deconvolution problem if 
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for all 6* e 6 q (y; 6) is continuously difFerentiable in 9 
for A — almost all y with gradient q (6) 



5 



(B2) \l{0)\e i2(K,Qe) for all 61 e 9 

(B3) 1(9) is nonsingular for all 6* G 8 and continuous in 9 . 

If is a true parameter value, call such model GMk{9) and denote by Qe the 
probability distribution function and by Eg the expectation corresponding to 
the density q{-\9). 

If conditions {Bl) — (-B3) holds, then by Proposition 1, p. 13 of 



Bickeletal 



(jl993[ ) we calculate for all y e supp q (• 



Then for y € supp q (•; 9) the efficient score vector for testing : = is 



^ . ^ -mik fe{s)h{y~ s)ds) 

l*{y):=l{y;Q) = \ , , , VTT^ ■ (5) 

Jr fo(s) h{y - s) ds 

Set 

L = {Eo[l*iY)f I* (6) 

and 



CI 

Theorem 1. for the regular deconvolution problem the efficient score vector 
I* for testing 6 — in GMk{9) is given for all x (z M. by Moreover, under 
Hq : 9 — we have Uk xt as n ^ oo. 

We construct the test based on the test statistic Uk as follows: the null hy- 
pothesis Hq is rejected if the value of Uk exceeds standard critical points for 
X^— distribution. Note that we do not need to estimate the scores /*. 
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Corollary 2. If the deconvolution problem is regular and fe{-) is differentiable 
in 6 for all 6 ^ Q, then the conclusions of Theorem]^ are valid and the efficient 
score vector for testing Hq : 9 = can be calculated by the formula 



Im fois)h{y- s)ds 

Example 1. Consider one important special case. Assume that each submodel 
of interest is given by the following restriction: all possible densities f oi X 
belong to a parametric exponential family, i.e., f ~ fe for some 9, where 



/e(a^) = /o(x)6(0)exp(0ow(x)), (9) 

where the symbol o denotes the inner product in M'^', u{x) = (Mi(a;), . . . , Mfc(x)) 
is a vector of known Lebesgue measurable functions, b{9) is the normalizing 
factor and 6* e e C M}= . We assume that the stand ard regularity assumptions on 
exponential families (see iBarndorff-NielsenI ([l978|)) are satisfied. All the possible 



densities q{y;9) oiY have in such model the form 



q{y;0)^ / /o(s) b{9) cxp(0 o u{s)) h{y-s)ds. (10) 

These densities no longer need to form an exponential family. If we assume, 
for example, that h > A — almost everywhere on R and the functions /q, h, 
ui, . . . ,Uk are bounded and A — measurable and that there exists an open subset 
01 C 8 such that |Z (y ; 6')| G L2{Qe) and the Fisher information matrix is 
nonsingular and continuous in 9, then conditions (Bl) — (-B3) are satisfied for 
this problem and the previous results are applicable. The score vector for the 
problem is 

j*t \ /r u{s) fojs) Hy - s) ds f 

I iy) = r r r \ uf 73 / u{s)fo{s)ds. (11) 



/r fois)h{y- s)ds 
In other words, if we denote by * the standard convolution of functions, 

i''iy)-^-^f^iy)-Eouix). (12) 

Jo* n 
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Let L be defined by ^ and 
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v^-{^,p*iy.)]L{^^p%y^)] . (13) 

This is the score test statistic designed to be asymptotically optimal for testing 
Hq against the alternatives from the exponential family ([9|). Its asymptotic 
distribution under the null hypothesis Hq is given by Theorem [TJ 



4. Selection rule 

For the use of score tests in classical hypotheses testing it was shown (see the 
Introduction) that it is important to select the right dimension k of the space of 
possible alternatives. Incorrect choice of the model dimension can substantially 
decrease the power of a test. In Section [5] we give a theoretical explanation of 
this fact for the case of deconvolution. The possible solution of this problem 
is to incorporate the test statistic of interest by some procedure (called a se- 
lection rule) that chooses a reasona ble dimension of the model automatically 
by the data. See 



Kallenbergj (|2002f) for an extensive discussion and practical 
examples. In this section we implement this idea for testing the deconvolution 
hypothesis. First we give a definition of selection rule, generalizing ideas from 



Inglot and Ledwina 



( 2009) . 

Denote by Mk (0) the model described in Section [3] such that the true pa- 
rameter 6 belongs to the parameter set, say Qk, and dimGfe = k. By a nested 
family of submodels Mk{0) for fc = 1, 2, . . . we mean a sequence of these models 
such that for their parameter sets it holds that 0i C 02 C . . . . 

Definition 2. Consider a nested family of submodels Mk{0) for k ~ 1, . . . ,d, 
where d is fixed but otherwise arbitrary. Choose a function 7r(-, •) : N x N ^ M, 
where N is the set of natural numbers. Assume that 7r(l, n) < 7r(2, n) < . . . < 
7r(d, n) for all n and 7r(j, n) — 7r(l, n) — s- cx) as n — > cx) for every j — 2, . . . , d. Call 
7r(j, n) a penalty attributed to the j-th model Mj{9) and the sample size n. Then 
a selection rule S for the test statistic Uk is an integer-valued random variable 
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satisfying the condition 



S = min{fc : I < k < d; Uk - 7r(fc, n) > Uj - 7r(j, n), j ^ 1, . . . ,d} . (14) 

We call Us a data-driven efficient score test statistic for testing validity of the 
initial model. 

From Theore m [5] below it follow s that for our problem (as well as in the 



classical case, see 



Kallenbergl (|2002i )l many possible penalties lead to consistent 



tests. So the choice of the penalty should be dictated by external practical 
considerations. Our simulation study is not so vast to recommend the most 
practically suitable penalty for the deconvolution problem. Possible choices are, 
for example, Schwarz's penalty 7r(j, n) = j logn, or Akaike's penalty 7r(j, n) — j. 

Denote by Pq the probability measure corresponding to the case when Xi , . . . , Xn 
all have the density /q. For simplicity of notation we will further sometimes omit 
index "n" and write simply Pq. The main result about the asymptotic null dis- 
tribution of Us is the following 

Theorem 3. Suppose that assumptions (Bl) — (-B3) holds. Then under the null 
hypothesis Hq it holds that Pq{S > 1) — > and Us ~>-d Xi as n oo. 

Remark 4. The selection rule S can be modified in order to make it possible to 
choose not only models of dimension less than some fixed d but to allow arbitrary 
large dimensions of Mk{0) as n grows to infinity. In this case an analogue of 
Theorem [3] still holds, but the proof becomes more technical and one should 
take care about the possible rates of growth of the model dimension. Though, 
one can argue that even d = 10 is often enough for practical purposes (see 



Kallenberg and Ledwinal (|l995f )). 



5. Consistency of tests 

Let be a true distribution fmiction of X. Here F is not necessarily parametric 
and possibly doesn't have a density with respect to A. Let us choose for every 
k < d a.n auxiliary parametric family {/e}, 6* e 8 C M*^ such that /o from this 
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family coincides with /o from the nuh hypothesis Hq. Suppose that the chosen 
family {fe} gives us the regular deconvolution problem in the sense of Definition 
[TJ Then one is able to construct the score test statistic Uk defined by ^ despite 
the fact that the true F possibly has no relation to the chosen {fe}. One can use 
the exponential family from Example 1 as {/e}, or some other parametric family 
whatever is convenient. This is our goal in this section to determine under what 
conditions thus build Uk will be consistent for testing against F. 
Suppose that the following condition holds 

(Dl) there exists an integer K > 1 such that K < d and 
Ep II — 0,. . . , Ep Ix-i — Oj l*^ — Ck 7^ , 

where I* is the i— th coordinate function of I* and I* is defined by ((51), d is the 
maximal possible dimension of our model as in Definition [2] of Section 01 and 
Ep denotes the mathematical expectation with respect io F * h. 

Condition {Dl) is a weak analog of nondegeneracy: if for all k {Dl) fails, then 
F is orthogonal to the whole system {^*}^]^, and if this system is complete, then 
F is degenerate. Also {Dl) is related to the identifiability of the model (see the 
beginning of Section [TO] for more details). 

We start with investigation of consistency of C/fe, where k is some fixed num- 
ber, 1 < fc < d. The following result shows why it is important to choose the 
right dimension of the model. 

Proposition 5. Let {Dl) holds. Then for all 1 < k < K — 1, if F is the true 
distribution function of X, then Uk — >d Xk as n ^ co . 

This result and Theorem [T] show that if the dimension of the model is too small, 
then the test doesn't work since it doesn't distinguish between F and /q. 

Proposition 6. Let {Dl) holds. Then for k > K, if F is the true distribution 
function of X, then Uk ~^ oo in probability as n —> oo . 

Now we turn to the data-driven statistic Us. Suppose that the selection rule S 
is defined as in Section [H Assume that 
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(SI) for every fixed > 1 it holds tiiat ^{k, n) = o{n) as n — * oo . 

Denote by Pp the probability measure corresponding to the case when Xi, . . . , 
Xn all have the distribution F. Consider consistency of the " adaptive" test based 
on Us- 

Proposition 7. Let (Dl) and (SI) holds. If F is the true distribution function 
of X, then Pp {S > K) ^ 1 and Us ^ oo as n —> oo . 

The main result of this section is the following 

Theorem 8. 

1. The test based on Uk is consistent for testing against all alternative dis- 
tributions F such that (Dl) is satisfied with K < k 

2. The test based on Uk is inconsistent for testing against all alternative 
distributions F such that (Dl) is satisfied with K > k 

3. If the selection rule S satisfies (SI), then test based on Us is consistent 
against all alternative distributions F such that (Dl) is satisfied with some 
K. 

6. Composite deconvolution 

In the previous sections, we treated the simplest case of the deconvolution prob- 
lem. The next sections are devoted to the more realistic case of unknown error 
density. Our main ideas and constructions will be similar to the ones for the 
simple case. Our goal is to modify the technics and constructions from the sim- 
ple hypothesis case in order to apply them in the new situation. In order to do 
this we will have to impose on our new model additional regularity assumptions 
of uniformity. These assumptions are quite standard in statistics. They are a 
necessary payment for our ability to keep simple and general constructions for 
the more complicated problem. We will have to modify the scores we used in 
the simple case. The modification we will use is called efficient scores. 
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Despite of all the changes, we will still be able to build a selection rule for 
the new problem. We will need a new and modified definition of the selection 
rule. Big part of the new model uniformity assumptions will be needed not to 
build an efficient score test, but to make such test data-driven (see section H]). 

Consider the situation described in the first paragraph of Section [21 but with 
the following complication introduced. Suppose further on that the density h of 
e is unknown. 

Then the most general possible null hypothesis Hq in this setup is that / — 
/o and the error s has expectation and finite variance. The most general 
alternative hypothesis Ha is that f =/= fo- Since both Hq and Ha are in this 
case too broad, we would first consider a special class of submodels of the model 
described above. At first we assume that all possible densities f of X belong to 
some specific and preassigned parametric family {fe}, i.e., f — fe for some 9 
and is a /c— dimensional Euclidian parameter and O C R'' is a parameter set 
for 9. Our starting assumption about the density of the error e will be that h 
belongs to some specific parametric family {/i,,}, where 77 G A and A C _R™ is a 
parameter set. Thus, 77 is a nuisance parameter. The null hypothesis Hq is the 
following composite hypothesis: X has particular density /o with respect to A. 

Then we will propose a test that is expected to be asymptotically optimal 
(in some sense) for testing in this parametric situation. After that we will prove 
that our test is consistent also against a wide class of nonparametric alternatives. 
Moreover, the test is expected to be asymptotically optimal (in some sense) for 
testing against an infinite number of directions of nonparametric alternatives. 
This is essentially the same plan as for the simple case. 

If {9, 77) is a true parameter value, we call such submodel Mk^mi^, v)- Denote 
in this case the density of Y by g{-; [9, rf)) and the corresponding expectation by 
E(^Q^ri)- Let the null hypothesis Hq he 9 ~ 6q, where it is assumed that 6*0 G O. 
Then the alternative hypothesis 9 ^ 9q is a parametric subset of the original 
general and nonparametric alternative hypothesis Ha- 
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7. Efficient scores 



All possible densities g [y ; {9, rj) ) of F have in our model the form 

9{y;{0,v))= [ fe{s)K{y~s)ds. (15) 

JS. 

It is not always possible to identify 6 or/and in this model. Since we are 
concerned with testing hypotheses and not with estimation of parameters, it is 
not necessary for us to impose a restrictive assumption of idcntifiability on the 
model. We will need only a (weaker) consistency condition to build a sensible 
test (see Section fTU|). 



The score function for {9^rf) at {9o,r]Q) is defined as (see 
p.28): 



Bickeletal 



(119931), 



(16) 



where le^ is the score function for 9 at and Ij/g is the score function for rj at 
T]o, i.e. 



(17) 



(/r fe{s)Ko{y-s) 



ds 



/r feo{s)hr^ai.y- s) ds 



— If 



-§-^{9{y;{do,v)))i 
9{y\{Qo.m)) 



1 



(18) 



drj 



(/r feois)Kiy- 



s) ds 



v=vo 



ly-g{v;(0o,vo))>o] 



/r f0o{s)hr,g{y- s)ds 
The Fisher information matrix of parameter (0, 77) is defined as 



= / ^M(y)'(^,r,(y) dGe,,(y), 



(19) 
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where Go,ri{y) is the probability measure corresponding to the density g (y ; {9, rj)). 
The symbol 'T' denotes the transposition and all vectors are supposed to be row 
ones. 

We assume that Mk^m{S, rf) is a regular parametric model in the sense of the 
following definition. 

Definition 3. Call our problem a regular deconvolution problem if 

(Al) for all (0, ry) e © X A g{y] {0, r/)) is continuously differentiable 
in {9, rf) for A — almost all y 

(A2) \i{9,r])\ e L2{R,Ge,r,) for aU {9,r])eQxK 

(A3) I{9, T]) is nonsingular for all (^, ry) € 6 x A and continuous 
m{9,v). 

This is a joint regularity condition and it is stronger than the assumption that 
the model is regular in 6 and rj separately. Let us write I{9o,r]o) in the block 
matrix form: 



m,Vo) = I I , (20) 

where /ii(^o,??o) is fc x fc, h2{0o,r]o) is k x m, l2i{0o,r]o) is m x fc, /ii(^o,??o) is 
m X m. Thus, denoting for simplicity of formulas Q, := [y : g {y ; {OoiVo)) > 0] 
we can write explicitly 



In{Oo,vo) = Eea,r,J'eJeo = / i'Ioiy)ieo{y) dGe^^rioiv) (21) 

Jr 



r w{Ir /e(s) No ( y - s) rfs) ^^^^ -^{Ir No ( y - s) rfs) 
Jn lMfeo{s)hr,o{y-s)ds 



dy, 
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(22) 



-§g (/r feis) hri„iy - s) dsj fe„is) h^{y - s) dsj 



■dy, 



in Ir feo{s)hrjo{y- s)ds 

and analogously for /2i(^0;%) a nd 722(^01 ??n) - The efficient score function for 



in Alk,m{0,r]) is defined as (see 



Bickeletal 



(Il993h . p.28): 



leoiy) = '90(2/) - h2{do,vo) 122^ {Oo,vo) i^oiy) ^ (23) 

and the efficient Fisher information matrix for in Affc^„i((?, 77) is defined as 



T* 



looiy) leoiy)dGo„,M- 



(24) 



Before closing this section we consider two simple examples. 
Example 2. Suppose G K, ?/ G K+ and, moreover, {fg} is a family {A^(6', 1)} 
of normal densities with mean 6 and variance 1, and {h,f} is a family {A^(0, 77^)}. 
Then g{9, r/) = fg * ft,,, ^ -/V(^?, + !)• Let 6 be the parameter of interest and t] 
the nuisance one. Let Hq he 9 — Oq- By ([TT]) and for all y 



ie„{y) 

By (Ha) 



y-% 



ino (y) 



(y-go)^^o 



'70 



(25) 



h2ie,7j) 



y-o 



{y - o)'v 



dN{e,r^^ + \){y) = 0, 



If + nirf + lf rf + l_ 
for all 0,ri. This means that adaptive estimation of 6 is possible in this model, 
i.e., we can estimate 9 equally well whether we know the true 770 or not. Though, 
we will not be concerned with estimation here. From (j2ip we get 



jy - 0? 
(772 + 1)- 



d7V(f?,772 + l)(7/) 



(26) 
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Example 3. Suppose now that we are interested in the parameter rj in the 
situation of Example 2 and the null hypothesis is Hq : rj ~ rjQ. There is a sort 
of symmetry between signal and noise: "what is a signal for one person is a 
noise for the other" (see also Remark H]) . From Example 2 we know that the 
score function for 77 at 770 is given by ([25]) • Since we proved for this example 
I12 = hi = 0, the efficient score function Z*^ for 77 at 770 is given by as well. 
We calculate now 



The constant C{r]o) in (j27|) can be expressed explicitly in terms of 770, but 
this is not the point of this example. By the symmetry of 9 and 77 we have 

Remark 9. Note that the problem is symmetric in 9 and 77 in the sense that it is 
possible to consider estimating and testing for each parameter, 9 or 77. Physically 
this means that from the noisy signal one can recover some "information" not 
only about the pure signal but also about the noise. This is actually natural 
since a noise is in fact also a signal. We are observing two signals at once. The 
payment for this possibility is that except for some trivial cases one can't recover 
full information about both the signal of interest as well as about the noise. 

8. Efficient score test 

Let be defined by ^ and /|^^ by (HH). Note that both and Ig^ depends (at 
least in principle) on the unknown nuisance parameter 779. Let /* and L be some 
estimators of (Yj) and {Ig^ ) ^ correspondingly. These estimators are supposed 
to depend only on the observable Yi, . . . ,Yn , but not on the Xi, . . . , Xn- 

Definition 4. We say that /* is a sufficiently good estimator of Ig^iYj) if for 
each {9o, rjo) E & X A it holds that for every e > 



1 

,/n 



> e j ^ as 77 00 , (28) 
where || • || denotes the Euclidian norm of a given vector. 
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In other words, condition means that the average SiLi^eo(^ ) ~ ^So,rio^0a 
is consistently estimated. We illustrate this definition by some examples. 
Example 2 (continued). We have (denoting variance of Y by cr^(F)): 



Define 



I* ■= 



where af-^ is any -^/n— consistent estimator of the variance of Y. One can take, for 
example, the sample variance s^j = s'^^{Yi, . . . ,Yn) as such an estimate. Then, 
since by the model assumptions cr'^{Y) > 0, thus constructed /* satisfies Defini- 
tion [H See Appendix for the proof. □ 
Example 3 (continued). We have in this case 



Vo 



'lo "1" ^ 



^0 



For simplicity of notations we write l^g{Yj) — Ci{riQ)(Yj — 6*0)^ — C2{vo)- Let 6'„ 
be any -^n— consistent estimate of 6*0 and put /* := Ci{rio){Yj — 6'„)^ — (72(770)- 
Then Definition |4] is satisfied in this Example also. This is proved in Appendix. 
□ 

Definition U] reflects the basic idea of th e method of estima t ed scor e s. Thi s 
method is widely used in statis t ics (see 



Ibragimov and Has'minskiil (|l98ll ) 



Bickeletal 



(19931), 



Inglot and Ledwinal ()2006l ) and others). These 



Schick! (|198ffl . 



authors show that for different problems it is possible to construct nontrivial 
parametric, semi- and nonparametric estimators of scores such that these esti- 
mators will satisfy (l28l) . 



Definition 5. Define 



where L is an estimate of (Ig^^)^^ depending only on Fi, . . . , y„. Note that I* is 
a fc— dimensional vector and i is a fc x fc matrix. We call Wk the efficient score 
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test statistic for testing : 9 — 6q in Mk_rn{0,vi). It is assumed that the null 
hypothesis is rejected for large values of Wk. 

Normally it should be possible to construct reasonably good estimators r/„ of 
•q by standard methods since at this point our construction is parametric. After 
that it would be enough to plug in these estimates in ((25)) and get the desired 
l*'jS satisfying (pS]) . 

Example 2 (continued). Let ^^(y) be any -^/n— consistent estimate of 77^ + 1 
such that this estimate is based on Yi , . . . , y„ . Then by (|26p , and definition 
(f^ the efficient score test statistic for testing Hq : 9 — Oq (in the model 

Mi,i(0,ry)) is 



Example 3 (continued). Using any y/n— consistent estimate 9 of 9, we get 
the efficient score test statistic 



Remark 10. We make now the following remark to avoid possible confusions. For 
the simple deconvolution we had the score test statistics and now we have the 
efficient score test statistics. This does not mean that the statistics for simple 
deconvolution is " inefficient" . Here the word " efficient" has a strictly technical 
meaning. Because of the presence of the nuisance parameter we have to extract 
information about the parameter of interest. We want to do this efficiently in 
some sense. This is the explanation of the terminology. 

The following theorem describes asymptotic behavior of Wk under the null 
hypothesis. 

Theorem 11. Assume the null hypothesis Hq : 9 = 9q holds true, {Al)-{A3) 
are fulfilled, i28\) is satisfied, and L is any consistent estimate of {Ig^) ^. Then 
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Wk xl as n ^ oo , 

where denotes a random variable with central chi-square distribution with k 
degrees of freedom. 

9. Selection rule 

In this section we extend the construction of Section |4] to the case of composite 
hypotheses. First we give a general definition of a selection rule. 

Denote by Mk^m.{0,rj) the model described in Section [HI and such that the 
true parameter {9, rf) belongs to a parameter set, say Qk x A, and dmiQk = k. 
By a nested family of submodels Mk.m{0,'n) for ^ = 1,... we would mean 
a sequence of these models such that for their parameter sets it holds that 
Oi X A C 62 X A C . . . . 

Definition 6. Consider a nested family of submodels Mk,m{0, v) for k ~ 1, . . . , 
d, where d is fixed but otherwise arbitrary, and m is fixed. Choose a function 
7r(-, •) : N X N ^ M, where N is the set of natural numbers. Assume that 
7r(l, n) < 7r(2, n) < . . . < 7r(d, n) for all n and 7r(j, n) — 7r(l, n) 00 as n 00 
for every j = 2, . . . , d. Call 7r(j, n) a penalty attributed to the j-th model Mj{9) 
and the sample size n. Then a selection rule S{1*) for the test statistic Wk is an 
integer-valued random variable satisfying the condition 

S'(r) = min{/c : I < k < d] Wk-~Tr{k,n) >Wj ~T:{j,n), j ^ I, . . . ,d) . (32) 

We call the random variable Ws a data-driven efficient score test statistic for 
testing validity of the initial model. In this paper we also assume that the 
following condition holds. 

(SI) for every fixed /c > 1 it holds that 7r(/c, n) — o{n) as n 00 . 
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Unlike the case of the simple null hypothesis, in the case of the composite 
hypotheses the selection rule depends on the estimator I* of the unknown values 
lgg{Yj) of the efficient score function. This means that we need to estimate 
the nuisance parameter 77, or corresponding scores, or their sum. Surprising 
result follows from Theorem [12] below: for our problem many possible penalties 
and, moreover, essentially all sensible estimators plugged in Wk, give consistent 
selection rules. Possible choices of penalties are, for instance, Shwarz's penalty 
7r(j, n) = j logn, or Akaike's penalty 7r(j, n) — j. 

Denote by Pg^ the probability measure corresponding to the case when 
Xi, . . . , Xn all have the density f{Oo, rjo). The main result about the asymptotic 
null distribution of Ws is the following theorem (it is proved analogously to 
Theorem [3]) . 

Theorem 12. Under the conditions of Theorem \ll\ as n 00 it holds that 

P^lrjoi^in >l)-^0 and Ws^d xl 

Condition ([^5]) is what makes this direct reference to the case of the simple 
hypothesis possible. Estimation of the efficient score function Ig^ can be done 
by different ways. First way is to estimate the whole expression from the right 
side of ([23]) . For this method of estimation condition ([28]) is natural. The second 
and probably more convenient method of estimating Ig^ is via estimation of the 
nuisance parameter 77 by some estimator fj. But for this approach condition (j28p 
becomes something that have to be proved for each particular estimator. We 
hope that this inconvenience is excused by the fact that we are only introducing 
the new test here. It is possible to reformulate conditio n (1^51) explici t ly in terms 



of conditions on ^, {fe}, and {ft.,,} (see an analogue in llnglot et al.l (|l997n ). 

Remark 13. The selection rule S{1*) can be modified in order to make it possible 
to choose not only models of dimension less than some fixed d, but to allow 
arbitrary large dimensions of Mfe_m(0,?7) as the number of observations grows. 
See Remark m 

Remark 14. It is possible to modify the definition of selection rule so that both 
dimensions k and m would be selected by the test from the data. A corresponding 
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test statistic will be of the form Ws, where this time S — {Si, 82)- Proofs of the 
asymptotic properties for this statistic are analogous to those presented in this 
paper. Possibly this statistic could be useful since the situation with the noise of 
an unknown dimension often seems to be more realistic. On the other hand, this 
statistic will also have some disadvantages. One will have to impose more strict 
assumptions on both signal and noise (including an analogue of the double- 
identifiability assumption). Also the final result will be weaker than the result 
of this section. This will be a payment for an attempt to extract information 
about a larger number of parameters from the same amount of observations Yi , 
Y 

10. Consistency of tests 

Let i*" be a true distribution function of X and H a true distribution of e. 
Here F and H are not necessarily parametric and possibly these distribution 
functions do not have densities with respect to the Lebesgue measure A. Let 
us choose for every k < d an auxiliary parametric family {fe}, E O C R*^ 
such that /o from this family coincides with /□ from the null hypothesis Hq. 
Correspondingly, let us fix an integer m and choose an auxiliary parametric 
family {/i,,}, 77 G A C K™. Suppose that the chosen families {fe} and {h^j} give 
us the regular deconvolution problem in the sense of Definition [31 Then one is 
able to construct the score test statistic Wk defined by (|29|) despite the fact 
that the true F and H possibly do not have any relation to the chosen {fg} and 
{hri}. This is our goal in this section to determine under what conditions thus 
build Wk will be consistent for testing against Ha- 
Suppose that the following condition holds 

(CI) there exists integer K > 1 such that K < d and 

Ef*H 1*0^(^1) = 0, . . . , Ef*H l*9o{K-l) = 0' ^loiK) = Cr ^ , 

where '^^^(j) is the i— th coordinate function of Ig^ and Ig^ is defined by ([251) . d 
is the maximal possible dimension of our model as in Definition [3] of Section [9l 
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and Ef*h denotes the mathematical expectation with respect to F * H. 

Condition (CI) is a weak analog of nondegeneracy: if for all k (CI) fails, then 
F is orthogonal to the whole system Ig^^^-^"^ ^ and if this system is complete, then 
F * H is degenerate. Also (CI) is related to the identifiability of the model: if 
the model is not identifiable, then F * H = Fq * H can happen and (CI) fails. 
Establishing identifiab i lity fo r the parametric deconvolution is not trivial (see 



Sclove and Van RvzinI (|l969f ). e.g.). It is important to note also that although 
(CI) has something common with both nondegeneracy and identifiability, it is 
in general pretty far from both these notions. 
The main result of this section is the following. 

Theorem 15. // Ii28\} is satisfied and L is any consistent estimate of (Igg) ^, 
then 

1. the test based on Wk is consistent for testing against all alternative distri- 
butions F, H such that (CI) is satisfied with K < k 

2. the test based on Wk is inconsistent for testing against alternative distri- 
butions F, H such that (CI) is satisfied with K > k 

3. if the selection rule S{1*) satisfies (51), then test based on Ws is consistent 
against all alternative distributions F * H such that (CI) is satisfied with 
some K. 

Part 2 of Theorem [15] shows why it is important to choose the suitable model 
dimension. Now we give two specific examples. 

Example 2 (continued). By Theorem [TBI the test based on Wi is consistent 
if and only if for true F and H it holds that 



^—EfMY-Oo)^0. i.e. EF.H{Y)^e„. (33) 

?r + 1 

For example, Wi doesn't work when the true H is symmetric about and the 
true F Fq has the mean equal to 6*0. 

Example 3 (continued). By Theorem [T51 is consistent if and only if for 
true F and H it holds that 
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E 



F*H 



{y - Of rja riQ 



^0, i.e. 



Ef*h {y-Of 7^ + 1, or equivalently Varp^H Y ^ Varp^Ho Y ■ (34) 

Note that condition ([55]) can be interpreted as " Wi is consistent for testing the 
hypothesis about the mean in this model iff the expectation of Y under aher- 
native is different from the expectation under the nuU hypothesis" and p4p as 
" Wi is consistent for testing the hypothesis about the variance in this model iff 
the variance of Y under alternative is different from the variance under the null 
hypothesis" . One cannot expect more from such a simple test as Wi . On con- 
trary, the data-driven test statistic Ws provides a consistent testing procedure. 
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Appendix. 

Proof. (Theorem [H. We calculated th e efficient score vector in (jll)-®. By 



Proposition 1, p. 13 of 



Bickeletal 



(|l993f ) and our regularity assumptions matrix 
L exists and is p ositive definite and nondegenerate of rank k. Under (Bl) — {B3) 
Eol*{y)^0 (see 



Bickeletal 



(|1993[ ). p. 15) and our statement follows. □ 
Proof. (Proposition!!]). Follows by the multivariate Central Limit Theorem. □ 
Proof. (Theorem [3]). Denote A{k, n) :— 7r(fc, n) — 7r(l, ri). For any k — 2, . . . ,d 



PoiS^k) < P^'{Uk-TT{k,n)>Ui-TT{l,n)) 
< P'f}{Uk>^{k,n)-n{l,n)) 
= P^'{Uk>A{k,n)). 
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By Theorem [T] Uk -^d x\ as n ^ oo, thus for A(fc,n) | oo as n — > oo we 
have PQ(Uk > A(/c,n)) as n oo, so for any k = 2, ... ,d we have 
Pq{S = fc) ^ as n ^ oo. This proves that 



Po"(^ > 2) = E ^o"(^ = fc) 0, n ^ oo, 

fc=2 

and so Pq{S — I) ^ 1. Now write for arbitrary real t > 



P^{\Us-Ui\>t) = P^{\U,~Ui\>t;S^l) 

d 

+ J2Poi\Um-Ui\>t;S^m) 

d 

= Y.Po{\Um~Ui\>t; S = m). (35) 

m=2 

For m — 2, . . . ,d we have Pq{S — m) 0, so 

d d 

< J2 ^oilUrn ~Ui\>t;S = m)<J2 ^oiS - m) ^ 



as 77 oo and thus by ([35|l it foUows that Us tends to Ui in probabihty as 
n oo. But Ui xl by Theorem [H so Us ^d xi 'dS n ^ oo. □ 

We shah use the following standard lemma from linear algebra. 

Lemma 16. Let x G M''", and let A he a k x k positive definite matrix; if for 
some real number 5 > we have A> 5 (in the sense that the matrix (A — SIkxk) 
is positive definite, where Ikxk is the k x k identity matrix), then for all x £ M*^ 
it holds that xAx'^ > 6\\x\\'^. 

Proof. (Proposition [S]). From (Dl) by the law of large numbers we get 
1 " 

- (^j) ^ l<i<K-l (36) 
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We apply Lemma fTOl to the matrix L defined in (HI; since all the eigenvalues of 
L are positive we can choose 5 to be any fixed positive number less than the 
smallest eigenvalue of L. We obtain the following inequality 



j=l z=l ^ j=l ' 



> 5 



> S n 



i=i 



(38) 



Now by dsn) and jS?]) we get for all s S R 

1 " 



and this proves the Proposition. 



< \ I — — I as n — > oo . 

n 



□ 



Proof. (Proposition [7]) ■ Let 7r(fc,n) and A(fc,n) be defined as in Section[H For 
any i = I, . . . , K — 1 we have 



Pf iS = i) < Pf {U^ - n{i, n) > Uk - tt{K, n)) 

= PF{U,>UK-iTr{K,n)-7r{i,n))). 

By dSlI) and jMl) we get 



(39) 



Pf (uk >S^ri 



1 as n ~> oo . 



(40) 
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Note that 



PF{U^>UK-{7T{K,n)-TT{t,n))) (41) 

< PF(^U^>S^n- {tt{K, n) - n{i, n)); Uk > S ^ n 
+ PF(uK<5^n 
Since by (51) it holds that TriK, n) — 7r(i, n) ~ o{n), we get 



Pf[U,>S ^n~{TT{K, n) - tt{i, n));UK>S^n] (42) 



< PF[U,>S^n-(7r{K,n)-n{i,n)) 

< PF{U,>6^n 



as 11 —t oo by Chebyshev's inequahty since by Proposition [^1 we have Ui 
xf as n oo for aU i = 1, . . . , X - 1. Substituting ^ and (gH) to jH]) 
we get Pp (S = i) ^ as n ^ cx3 for aU i = 1, . . . , X — 1. This means that 
Pf {S > K) 1 as n ^ oo. 
Now write for t E R 



Pf {Us <t) = PF{Us<t-S <K -1) + Pf [Us < t- S > K) =: Ri + R2. 

But i?i ^ since Pf {S — i) for i = 1, . . . , X — 1 and K < d < 00. Since 
C^/i > U12 for li>l2, we get 

d 
l=K 

as 71 00 by Proposition [HI Thus Pf {Us < <) ^ as n ^ 00 for aU t e M. □ 

Proof. (Theorem [H]). Part 1 foUows from Theorem [T] and Proposition part 2 
from Theorem [T] and Proposition O part 3 from Theorem [3] and Proposition 

m □ 

imsart-generic ver. 2007/04/13 file: Decoiivolutioii_Arxiv.tex date: February 1, 2008 



M. Langovoy /Score tests for deconvolution. 

Proof. (The statement about I* from Example 2). Indeed, 
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n 




1 


n 

E(^. - 




i=l 


\ 




1=1 








1 1 


1 




= \fn 




n 



T2(r) s 

11 



But 



=\y -eo\ =\y -Ey 



in Gep_^„— probability, therefore Definition U] is satisfied if 



IS 



bounded in Ge^.j^g— probability, and this holds if ct^j is a y^— consistent estimate 



of a'^iY). Here Y denotes the sample mean Y — ^ SlLi 
Proof. (The statement about I* from Example 3). 



□ 



i=l 



\Ciivo)\ 



\Ciivo)\ 



J2{iY,-e,,f^{Y,-eof) 

n 

J2{en-eo){-2Y, + en + 9o] 



1^1(770)1 VnlOn - 0o\ 



i=l 



\Ciirjo)\ V^\On - do\ \{Y -9r,) + iY -0o)\ 
< |Gi(?7o)| V^l^n-^ol (|F-^„| + |F-0o|) -^0 

in G^p.^Q— probability since for n — > cx3 it holds that |y — | and [K— 0o| ~^ 
0, both in Ggo.rfn "Probability, and -v/nl^n^^ol is bounded in G^n^^j, —probability. 

□ 

Proof (Theorem [n]). Put 

■ ^ i = l ■' ^ ^ i = l 



(43) 
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where Ig^ is defined by ([23|l and Ig^ by ([21]) . Of course, Vfe is not a statistic 
since it depends on the unknown 770- But if the true 770 is known, then because 
of {B1)-{B3) we can apply the multivariate Central Limit Theorem and obtain 
Vk -^d as n ^ 00 . Condition (128p implies that 

_^ n 1 

--j= /* ^ ^eo(^) ill Geo. probability as n ^ 00 

and by consistency of L we get the statement of the theorem by Slutsky's 
Lemma. □ 

Proof. (Theorem [15]). Because of condition the proof is analogous to the 
proof of Theorem [S] Indeed, after obvious change of notations Propositions [3 
El and[7]are true for Wk, M^s(;*)i <S'(^*) instead of f/fe. Us, S. Proofs of the new 
versions of propositions are analogous to the proofs of the previous versions. 
The only difference is that the proof of the key inequality analogous to ((551) 
requires the use of the following lemma. 

Lemma 17. Let A be a k x k positive definite matrix and {An}'^^i be sequence 
of k X k matrices such that An A in the Euclidian matrix norm. Suppose 
that for some real number S > we have A > S in the sense that the matrix 
(A — SIkxk) is positive definite, where Ikxk is the k x k identity matrix. Then 
for all sufficiently large n it holds that An > S. 

□ 
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